Testing characteristics of samples.md (2889B)
1 +++ 2 title = 'Testing characteristics of samples (goodness-of-fit, independence, homogeneity)' 3 template = 'page-math.html' 4 +++ 5 6 # Testing characteristics of samples (goodness-of-fit, independence, homogeneity) 7 8 ## Goodness-of-fit 9 10 Checks if observed freq. distribution fits a claimed distribution. 11 Sample size n with k different categories. 12 13 Hypotheses: 14 - $H_{0}$: frequency counts agree with claimed distribution 15 - $H_{A}$: frequency counts do not agree with the claimed distribution 16 17 $O_{i}$ is observed frequency count of category *i*. $E_{i} = n \times p_{i}$ is the expected frequency count. 18 19 Test statistic is: 20 $\chi^{2} = \sum_{i=1}k\frac{(O_{i} - E_{i})^{2}}{E_{i}}$ 21 22 and has approximately a chi-square distribution with k − 1 degrees of freedom under the null hypothesis. 23 24 Critical value: 25 26 - reject null hypothesis if $\chi_{2} > \chi^{2}_{k-1, \alpha}$ 27 - P value: reject null hypothesis if $P(\chi^{2} \geq x^{2}) < \alpha$ 28 29 test is right-tailed since we need large values of test statistic (even if hypothesis is undirected). 30 31 ## Test of independence 32 33 When: two variables in a *single sample* 34 35 you have a contingency table with r row categories and c column categories. checking to see if columns and variables are dependent. 36 37 H0: row and column variables are independent 38 HA: row and column variables are dependent 39 40 test statistic: 41 42 $\chi^2 = \sum_{cells} \frac{(O-E)^{2}}{E}$ 43 44 has under H0 approximately a chi-square distribution with (r − 1)(c − 1) degrees of freedom. 45 46 reject null hypothesis if $\chi^{2} > \chi^{2}_{(r-1)(e-1), \alpha}$ 47 48 ## Test of homogeneity 49 50 When: comparing two or more samples to see if they have the same proportions of characteristics. 51 52 r different populations (rows) and c different categories (columns) of some variable checking for proportions of a characteristic in the populations. 53 54 H0: different populations have same proportions of some characteristics 55 56 HA: different populations don’t have the same proportions of some characteristics. 57 58 test statistic: 59 60 $\chi^{2} = \sum_{cells} \frac{(O-E)^2}{E}$ 61 62 has under H0 approximately a chi-square distribution with (r − 1)(c − 1) degrees of freedom. 63 64 reject H0 if observed $\chi^{2} > \chi^{2}_{(r-1)(e-1),\alpha}$ 65 66 ## Fisher’s exact test for 2-by-2 contingency table 67 68 either: 69 70 - H0: row and column variables are independent 71 - HA: occurrence of “first column category” is more common in group of “first row category” than in group of “second row category” 72 73 or: 74 75 - H0: populations have same proportion of one characteristic 76 - HA: the proportion of the characteristic is bigger/smaller in one population 77 78 test statistic: frequency count in cell (1,1) has under H0 and given marginals a hypergeometric distribution 79 80 parameters n = (first row total), N = (grand total), and k = (first column total) 81 82 guess we don’t need to know how to do this manually.